Structure and Stability of Internet Top Lists
نویسندگان
چکیده
Active Internet measurement studies rely on a list of targets to be scanned. While probing the entire IPv4 address space is feasible for scans of limited complexity, more complex scans do not scale to measuring the full Internet. Thus, a sample of the Internet can be used instead, often in form of a “top list”. The most widely used list is the Alexa Global Top1M list. Despite their prevalence, use of top lists is seldomly questioned. Little is known about their creation, representativity, potential biases, stability, or overlap between lists. As a result, potential consequences of applying top lists in research are not known. In this study, we aim to open the discussion on top lists by investigating the aptness of frequently used top lists for empirical Internet scans, including stability, correlation, and potential biases of such lists. 1. LISTS OF POPULAR DOMAINS Internet Top lists contain frequently accessed domains according to typically proprietary data by the list creator. The following lists are widely used: Alexa top lists [1] are created based on usage data collected by the Alexa browser plugin. No information exists on the plugin’s user base and thus opens questions on list representativity and potential biases (towards the plugins’ unknown user base). Alexa lists are offered for sale with few free offerings. The most popular free offering is the list of the global top 1M domains. Paid offerings include top lists per country, industry, or region. For each, the top 50 entries can be viewed free of charge. Quantcast [4] provides a list of the top 1M most frequently visited web sites per country, measured through their web intelligence plugin on sites. Only the US-based list can be downloaded, all other lists can only be viewed online and hide ranks when not purchased. Thus, we do not systematically download and evaluate it. Majestic Million [3] offers a creative commons licensed top 1M list based on Majestic’s web crawler, which ranks sites by the number of subnets linking to that site. This is a different data collection methodology, and similar to Alexa, heavily web-focused. Cisco Umbrella [2] contains the list of top 1M domains (including sub domains) according to DNS List # Papers Alexa Only IMC PAM TMA Global Top Specific Alexa [1] 35 13 8 500 7 Country 8 Umbrella [2] 1 – – 1k 5 Category 6 Quantcast [4] 1 – – 10k 3 Global 48 100k 4 1M 27 Table 1: Use of top lists in 260 papers published at IMC, PAM, and TMA in 2015, 2016, and 2017. queries by users of Cisco’s OpenDNS system. This is a fundamentally different nature than collecting web site visits or links, as it is based on DNS requests for all kinds of Internet services, not just web sites. Top List Use in Research. We start by studying the use of top lists in measurement research among 3 Internet Measurement conferences (i.e., IMC, PAM, and TMA) in 2015–2017. Out of 260 papers published at these conferences, 56 (21.5%) utilize a top list (see Table 1). We find all 56 papers to use an Alexa list, while two papers additionally use either the Cisco Umbrella or the Quantcast list. Of these 56 papers, 48 use the global list, 8 a country-specific list, and 6 categorical lists, with some papers using several of these. Two papers only state to use “the Alexa list”. 2. STRUCTURE: SUBDOMAIN DEPTH Top lists vary in the provided level of detail in terms of subdomain depth. For example, for www.net.in.tum.de, .de is the public suffix, tum.de is the base domain, in.tum.de is the first subdomain, and net.in.tum.de is the second subdomain. We count the list entry www.net.in.tum.de as a third-level subdomain. Table 2 shows the average number of base domains (μBD) per top list. We note that Alexa and Majestic contain almost exclusively base domains with few exceptions (e.g., for blogspot). In contrast, Umbrella only contains an average of 28% base domains. Table 3 details the subdomain depth for a single-day snapshot of all lists. Umbrella holds deep subdomains levels, up to level 33 (an IPv6 rDNS pointer). We also note that the base domain is usually part of the list when its subdomains are listed: On average each list contains only few hundred subdomains whose base domain is not part of the list. Do you archive a top list and want to help future studies by sharing this with us? Then please contact us! 1 ar X iv :1 80 2. 02 65 1v 1 [ cs .N I] 7 F eb 2 01 8 201 7-07 201 7-08 201 7-09 201 7-10 201 7-11 201 7-12 201 8-01 5000
منابع مشابه
Addiction to Internet, Modeling and Control
In this paper, we propose a mathematical model of addiction to Internet. We investigate the effect of educational programs on the control of the addiction. We compute basic reproduction number. By using Chavez-Song theorem, we show the occurrence of backward bifurcation. We prove the global stability of the equilibrium points using geometric stability method and Lyapunov function.
متن کاملInternet Addiction Based on Personality Characteristics of High School Students in Kerman, Iran
Background: The new phenomenon of Internet addiction among teenagers and young adults is one of the modern addictions in industrial and post-industrial societies. The purpose of this research was to predict the Internet addiction based on the personality characteristics of high school students in Kerman. Methods: This research was a descriptive correlational study. The statistical population in...
متن کاملAn investigation on the effect of alumina on hydrothermal stability of nanostructured silica membrane prepared by sol-gel method
In the present study, the effect of alumina on the pore structure and hydrothermal stability of nanostructured silica was investigated. SiO2 and SiO2-15wt%Al2O3 membranes were prepared by dip coating on mesoporous γ-Al2O3 coated macroporous α-alumina support. The particle sizes of sol were increased by adding of alumina to silica...
متن کاملHow International Financial Crisis Affects Industries in Beijing, Capital City of China
Beijingâs industrial structure is service-oriented with a high degree of economic openness, and GDP has maintained rapid growth. This paper analyzes the international financial crisis and the impact of Chinaâs anti-crisis policies on Beijingâs economic development as well as its transmission mechanism. Impact Index (excluding seasonal factors) and ARMA Model are employed in the empirical...
متن کاملGet a Way Back: Evaluating Retrieval from History Lists
Many user interfaces include history lists that help users retrieve temporally ordered information such as previously visited web pages, email messages, and recently used files. Two main types of history lists are widely used. The first type, typified by Netscape Navigator’s history list, provides a linear temporally ordered list. The second type, typified by Microsoft Internet Explorer’s histo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1802.02651 شماره
صفحات -
تاریخ انتشار 2018